Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms
نویسندگان
چکیده
The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.
منابع مشابه
Heterogeneous Sparse Matrix Computations on Hybrid GPU/CPU Platforms
Hybrid GPU/CPU clusters are becoming very popular in the scientific computing community, as attested by the number of such systems present in the Top 500 list. In this paper, we address one of the key algorithms for scientific applications: the computation of sparse matrix-vector products that lies at the heart of iterative solvers for sparse linear systems. We detail how design patterns for sp...
متن کاملA Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
ÐIn this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor...
متن کاملSolvers on advanced parallel architectures with application to partial differential equations and discrete optimisation
This thesis investigates techniques for the solution of partial differential equations (PDE) on advanced parallel architectures comprising central processing units (CPU) and graphics processing units (GPU). Many physical phenomena studied by scientists and engineers aremodelled with PDEs, and these are often computationally expensive to solve. This is one of the main drivers of large-scale comp...
متن کاملA Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines
We study several solvers for the solution of general linear systems where the main objective is to reduce the communication overhead due to pivoting. We first describe two existing algorithms for the LU factorization on hybrid CPU/GPU architectures. The first one is based on partial pivoting and the second uses a random preconditioning of the original matrix to avoid pivoting. Then we introduce...
متن کاملDesign and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms Amani AlOnazi The progress of high performance computing platforms is dramatic, and most of the simulations carried out on these platforms result in improvements on one level, yet expose shortcomings of current CFD packages. Therefore, hardware-aware design and optimizations are crucial ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CLEI Electron. J.
دوره 19 شماره
صفحات -
تاریخ انتشار 2016